Show R code
install.packages("tinytex")
tinytex::install_tinytex()Last modified: 2026-01-08: 3:13:58 (UTC)
There are an overwhelming number of great resources for learning R; here are some recommendations:
There are several dedicated UC Davis courses on R programming:
DataLab maintains another list of courses: https://datalab.ucdavis.edu/courses/
DataLab also provides short-form workshops on R programming and data science: https://datalab.ucdavis.edu/workshops/
Demographics tables are important first steps in many data analyses and papers.
The gtsummary package is flexible and can probably provide whatever table options you’re looking for, and if not, the developers are usually very welcoming of feature requests.
If gtsummary is really not doing what you want, other packages I’ve used for demographics tables include:
data.frames and tibblestibblestidyverse
The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
These packages are being actively developed by Hadley Wickham and his colleagues at posit1.
Details:
See Wickham, Çetinkaya-Rundel, and Grolemund (2023) for details.
There are currently (2025) two commonly-used pipe operators in R:
%>%: the “magrittr pipe”, from the magrittr package (Bache and Wickham (2022); re-exported by dplyr and others) .
|>: the “native pipe”, from base R (\(\geq\) 4.1.0)
See https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe for a comparison of their behavior.
Wickham, Çetinkaya-Rundel, and Grolemund (2023) recommends the native pipe:
For simple cases, |> and %>% behave identically. So why do we recommend the base pipe? Firstly, because it’s part of base R, it’s always available for you to use, even when you’re not using the tidyverse. Secondly, |> is quite a bit simpler than %>%: in the time between the invention of %>% in 2014 and the inclusion of |> in R 4.1.0 in 2021, we gained a better understanding of the pipe. This allowed the base implementation to jettison infrequently used and less important features.
ggplot2 use piping?Here’s tidyverse creator Hadley Wickham’s answer (from 2018):
I think it’s worth unpacking this question into a few smaller pieces:
- Should ggplot2 use the pipe? IMO, yes.
- Could ggplot2 support both the pipe and plus? No
- Would it be worth it to create a ggplot3 that uses the pipe? No.
dplyr
The dplyr package provides two approaches for grouping data:
group_by(): Creates a grouped data frame that remains grouped for subsequent operations until explicitly ungrouped.by argument: Applies grouping for a single operation only, without modifying the data frame structureRecommendation: Default to using per-operation grouping with the .by argument, as it is more explicit, reduces the risk of accidentally operating on grouped data, and eliminates the need to remember to ungroup().
For a detailed comparison of these approaches, see ?dplyr_by.
Quarto is a system for writing documents with embedded R code and/or results:
To compile Quarto documents to pdf, run these commands first:
install.packages("tinytex")
tinytex::install_tinytex()See Knuth (1984) for additional discussion of literate programming.
One of quarto’s excellent features is the ability to convert the same source file into multiple output formats; in particular, I am using the same set of source files to generate an html website, a pdf document, and a set of revealjs slide decks.
I use ::: notes divs to mark text chunks to omit from the revealjs format but include in the website and pdf format.
This book espouses our philosophy of package development: anything that can be automated, should be automated. Do as little as possible by hand. Do as much as possible with functions. The goal is to spend your time thinking about what you want your package to do rather than thinking about the minutiae of package structure.
https://r-pkgs.org/introduction.html#:~:text=This%20book%20espouses,of%20package%20structure.
Read this ASAP: https://r-pkgs.org/whole-game.html
Use the rest of Wickham and Bryan (2023) as a reference
94% of respondents to a 2022 Stack Overflow survey reported using git for version control.
Over time, explore all the tabs and menus; there are a lot of great quality-of-life features.
History tab to view past commands; you can rerun them or copy them into a source code file in one click! (up-arrow in the Console also enables this process, but less easily).Many modern R packages are developed on Github, and welcome bug reports and pull requests (suggested edits to source code) through the Github interface.
To contribute to “base R” (the core systems), see https://contributor.r-project.org/
---
title: "Statistical Computing in R"
format:
html: default
revealjs:
output-file: intro-to-R-slides.html
pdf:
output-file: intro-to-R-handout.pdf
---
# Statistical computing in R
## Online R learning resources
There are an overwhelming number of great resources for learning R; here are some recommendations:
- [*The RStudio Education website*](https://education.rstudio.com), especially:
- [*Finding your way to R*](https://education.rstudio.com/learn/)
- *R for Epidemiology* (@r4epi)
- *The Epidemiologist R Handbook* (@epirhb)
- *Practical R for Epidemiologists* (@practicalr4epi)
- *R for Data Science* (@r4ds)
- *Advanced R* (@wickham2019advanced)
- *R Graphics Cookbook* (@changRgraphicscookbook)
- *R Packages* (@rpkgs)
- @NahhasIntroR (same author as @NahhasIRMPHR)
- @practicalr4epi
- @phdswR (previously @appliedepiusingR): Author is State Public Health Officer and Director, California Department of Public Health, <https://drtomasaragon.github.io/>)
- *SAS and R* (@kleinman2009sas)
- The ["sassy system"](https://r-sassy.org/) is "an integrated set of packages designed to make programmers more productive in R, particularly those with a background in SAS® software. The system leverages useful concepts and thought patterns to create a more efficient and satisfactory R programming experience."
* In particular, the [procs](https://cran.r-project.org/web/packages/procs/) package in R provides
versions of common SAS procedures,
such as 'proc freq', 'proc means', 'proc ttest', 'proc reg', 'proc transpose', 'proc sort', and 'proc print'
- *R for SAS and SPSS users* (@muenchen2011r)
- *Building reproducible analytical pipelines with R* (@rapR)
- *Posit Recipes: Some tasty R code snippets*: <https://posit.cloud/learn/recipes>
## UC Davis R programming courses
There are several dedicated UC Davis courses on R programming:
- [BIS 015L](https://catalog.ucdavis.edu/search/?q=BIS+015L):
Introduction to Data Science for Biologists
- see course materials at
<https://jmledford3115.github.io/datascibiol/>
- [ENV 224](https://catalog.ucdavis.edu/search/?q=ENV+224)/
[ECL 224](https://catalog.ucdavis.edu/search/?q=ECL+224):
Data Management & Visualization in R
- see lecture videos and course materials at
<https://ucd-r-davis.github.io/R-DAVIS/>
- [ESP 106](https://catalog.ucdavis.edu/search/?q=ESP+106):
Environmental Data Science
- [STA 015B](https://statistics.ucdavis.edu/expanded-descriptions/15b): Introduction to
Statistical Data Science II
- [STA 032](https://statistics.ucdavis.edu/expanded-descriptions/32):
Gateway to Statistical Data Science
- [STA 035A](https://statistics.ucdavis.edu/expanded-descriptions/35A):
Statistical Data Science
- [STA 035B](https://statistics.ucdavis.edu/expanded-descriptions/35B):
Statistical Data Science II
- [STA 141A](https://statistics.ucdavis.edu/expanded-descriptions/141A):
Fundamentals of Statistical Data Science
- [STA 242](https://statistics.ucdavis.edu/expanded-descriptions/242):
Introduction to Statistical Programming
- [ABG 250](https://catalog.ucdavis.edu/search/?q=ABG+250):
Mathematical Modeling in Biological Systems
- [PSC 203A](https://catalog.ucdavis.edu/search/?q=PSC+203A)
: "Data Cleaning & Management in the Social Sciences"
- [PSC 203B](https://catalog.ucdavis.edu/search/?q=PSC+203B)
"Data Visualization in the Social Sciences"
[DataLab](https://datalab.ucdavis.edu/) maintains another list of courses:
<https://datalab.ucdavis.edu/courses/>
DataLab also provides short-form workshops on R programming and data science:
<https://datalab.ucdavis.edu/workshops/>
## Demographics tables
Demographics tables are important first steps in many data analyses and papers.
The `gtsummary` package is flexible and can probably provide whatever table options you're looking for,
and if not, the developers are usually very welcoming of feature requests.
If `gtsummary` is really not doing what you want,
other packages I've used for demographics tables include:
- <https://cran.r-project.org/web/packages/procs/> (replicates common SAS commands)
- <https://cran.r-project.org/web/packages/arsenal/index.html> (from the Mayo Clinics)
- <https://cran.r-project.org/web/packages/table1/index.html>
## Writing functions
- Read this ASAP: <https://r4ds.hadley.nz/functions.html>
- Use this as a reference: <https://adv-r.hadley.nz/functions.html>
### Methods versus functions
See <https://adv-r.hadley.nz/oo.html#oop-systems>
### Debugging code
- <https://adv-r.hadley.nz/debugging.html>
- <https://www.maths.ed.ac.uk/~swood34/RCdebug/RCdebug.html>
## `data.frame`s and `tibble`s
### Displaying `tibble`s
See `vignette("digits", package = "tibble")`
## The `tidyverse`
> The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
- https://www.tidyverse.org/
These packages are being actively developed by [Hadley Wickham](https://hadley.nz/) and his colleagues at [posit](https://posit.co/)^[[the company formerly known as RStudio]( https://posit.co/blog/rstudio-is-becoming-posit/)].
Details:
- @welcometidyverse
- @r4ds
- @kuhn2022tidy
## Piping
See [@r4ds](https://r4ds.hadley.nz/data-transform.html#sec-the-pipe) for details.
There are currently (2025) two commonly-used pipe operators in R:
- `%>%`: the "`magrittr` pipe", from the [`magrittr`](https://cran.r-project.org/web/packages/magrittr/index.html) package (@magrittr; [re-exported](https://r-pkgs.org/dependencies-in-practice.html#re-exporting) by [`dplyr`](https://cran.r-project.org/web/packages/dplyr/index.html) and others) .
- `|>`: the "native pipe", from base R ($\geq$ 4.1.0)
See <https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe>
for a comparison of their behavior.
### Which pipe should I use?
@r4ds [recommends the native pipe](https://r4ds.hadley.nz/data-transform.html#sec-the-pipe:~:text=So%20why%20do%20we%20recommend%20the%20base%20pipe%3F):
> For simple cases, |> and %>% behave identically. So why do we recommend the base pipe? Firstly, because it’s part of base R, it’s always available for you to use, even when you’re not using the tidyverse. Secondly, |> is quite a bit simpler than %>%: in the time between the invention of %>% in 2014 and the inclusion of |> in R 4.1.0 in 2021, we gained a better understanding of the pipe. This allowed the base implementation to jettison infrequently used and less important features.
### Why doesn't `ggplot2` use piping?
Here's `tidyverse` creator Hadley Wickham's answer (from 2018):
> I think it's worth unpacking this question into a few smaller pieces:
>
> - Should ggplot2 use the pipe? IMO, yes.
> - Could ggplot2 support both the pipe and plus? No
> - Would it be worth it to create a ggplot3 that uses the pipe? No.
<https://forum.posit.co/t/why-cant-ggplot2-use/4372/7>
## Grouping operations in `dplyr`
The `dplyr` package provides two approaches for grouping data:
- **Persistent grouping** with `group_by()`:
Creates a grouped data frame that remains grouped for subsequent operations until explicitly ungrouped
- **Per-operation grouping** with the `.by` argument:
Applies grouping for a single operation only,
without modifying the data frame structure
**Recommendation**:
Default to using per-operation grouping with the `.by` argument,
as it is more explicit,
reduces the risk of accidentally operating on grouped data,
and eliminates the need to remember to `ungroup()`.
For a detailed comparison of these approaches,
see [`?dplyr_by`](https://dplyr.tidyverse.org/reference/dplyr_by.html).
## Quarto
Quarto is a system for writing documents with embedded R code and/or results:
- Read this ASAP: <https://r4ds.hadley.nz/communicate>
- Then use this for reference: <https://quarto.org/docs/reference/>
- Learn LaTeX in 30 minutes (not everything in here is relevant to Quarto): <https://www.overleaf.com/learn/latex/Learn_LaTeX_in_30_minutes>
- LaTeX symbol reference guide: <https://oeis.org/wiki/List_of_LaTeX_mathematical_symbols>
- LaTeX commands: <https://www.overleaf.com/learn/latex/Commands>
To compile Quarto documents to pdf, run these commands first:
```{r}
#| eval: false
install.packages("tinytex")
tinytex::install_tinytex()
```
See @knuth84 for additional discussion of literate programming.
## One source file, multiple outputs
One of quarto's excellent features is the ability to convert the same source file into multiple output formats;
in particular, I am using the same set of source files to generate an html website,
a pdf document, and a set of revealjs slide decks.
I use `::: notes` divs to mark text chunks to omit from the revealjs format but include in the website and pdf format.
## Packages
> This book espouses our philosophy of package development: anything that can be automated, should be automated. Do as little as possible by hand. Do as much as possible with functions. The goal is to spend your time thinking about what you want your package to do rather than thinking about the minutiae of package structure.
- https://r-pkgs.org/introduction.html#:~:text=This%20book%20espouses,of%20package%20structure.
- Read this ASAP: <https://r-pkgs.org/whole-game.html>
- Use the rest of @rpkgs as a reference
## Submitting packages to CRAN
- Read this first: <https://r-pkgs.org/release.html>
- A problems-and-solutions book is under construction: <https://contributor.r-project.org/cran-cookbook/>
## Git
94% of respondents to a
[2022 Stack Overflow survey](https://survey.stackoverflow.co/2022/#section-version-control-version-control-systems)
reported using git for version control.
[More details](https://r-pkgs.org/software-development-practices.html#sec-sw-dev-practices-git-github)
- *Happy Git with R* <https://happygitwithr.com/>
- <https://usethis.r-lib.org/articles/pr-functions.html>
- *Git Magic* <http://www-cs-students.stanford.edu/~blynn/gitmagic/>
- <https://ohshitgit.com/>
- <https://maelle.github.io/saperlipopette/>
## Spatial data science
- @spatialds
## Shiny apps
- Read @masteringshiny first
- Use @golembook as a reference
## Making the most of RStudio
Over time, explore all the tabs and menus; there are a lot of great quality-of-life features.
* use the `History` tab to view past commands; you can rerun them or copy them into a source code file in one click! (up-arrow in the Console also enables this process, but less easily).
## Contributing to R
Many modern R packages are developed on Github, and welcome
bug reports and pull requests (suggested edits to source code) through
the Github interface.
To contribute to "base R" (the core systems), see <https://contributor.r-project.org/>